Various Sentiments Related to Covid


Group 12: Brandon Hom, Christie Ngo, Pile He, Wesley Tat

Introduction

The idea of the project came from the question: “What exactly are the impacts of coronavirus?” It evolved to a desire to pinpoint various sentiments in response to coronavirus and topics related to coronavirus. As coronavirus has plagued the world for the last two years, there appears to be an ongoing discussion surrounding the public’s responses highlighted on social media. With polarized opinions on precautionary measures, we were interested in the influence of political affiliation on responsive actions, such as getting vaccinated. Lower incidence rates tied to high vaccination response would reveal vaccine effectiveness and also where campaigns for vaccination should be focused. We were also curious to see whether Asians were significantly affected by coronavirus as it was initially called the “Chinese virus” by former President Donald Trump; we want to determine whether people associated Asians with the virus. On the news, reports of crimes against Asians have become more prevalent since the start of the pandemic. Determining the severity and recent trends of crimes committed against Asians will illuminate how urgent legislators should take action.

With this in mind, we came up with the following questions:

  1. What is the overall sentiment of social media posts on COVID vaccinations? Specifically, what is the overall sentiment on vaccine mandates and vaccines in general?

  2. Are people’s views on the COVID vaccine related to their political affiliations?

  3. Is there an association between COVID and sentiments towards Asians?

We will answer the question proposed above in the order that they are given using various APIs, visualization methods, and online documentations.

Sentiment Analysis workflow

For our sentiment analysis, we pulled comments from both subreddits and youtube videos and then we first applied textblob for the sentiment analysis. We were focused on the polarity score that textblob returns, where scores closer to negative meant negative sentiment and vice versa. However, we found that textblob was highly inaccurate at classfiyng comments as negative or positive sentiment as seen from the comments below. For example, on the dataframe shown below we can see that Vader has a better rate at identifying what sentiment the comment actually is. Since textblob only detects which words are in a comment, if it has more negative words then it is negative. But Vader is able to check syntax and how the words are used to determine whether it is positive or negative to some accuracy. This is based on the Subreddit Republican, and we can see how the first row comment is negative, Textblob associates it as neutral. To elaborate more on Textblob vs Vader, we see that this comment below was given a 0.1461 score by Textblob which means a little positive when the comment is clearly negative and given a correct evaluation by Vader with its score of -0.9825. Due to the inaccuracies of textblob, we then applied nltk's Valence Aware Dictionary for Sentiment Reasoning (Vader).

From the compound scores that Vader returns, with negative compound scores indicating negative sentiment, we can already see that Vader performs better at determining the sentiment of comments. What is also convenient about VADER is that no preprocessing of text needs to be done, since it is meant to classify social media comments including punctuations and capitalizations. This is also the reason that the VADER was applied first before preprocessing the comments for visualization.

When it came to visualizing comments, we only visualized the positive or negative comments depending on the sign of the average compound scores for that video or subreddit. For example if the average resulted in a negative value, we filtered for the comments with negative compound scores and then applied the standard text preprocessing to visualize the words in a wordcloud.

Youtube sentiment of Vaccine Mandate

To answer the question on vaccine mandates, we used youtube's data api to obtain the top 500 comments for videos that were pertinent to supporting or not supporting the vaccine mandate. We chose only to request for the top 500 comments, since highly popular social media platforms, such as youtube, tend to have copious amounts of irrelevant comments. Therefore, obtaining all the comments would lead to many unrelated comments in our dataset that can skew and invalidate our sentiment analysis. By obtaining the top 500 comments, we essentially obtained a balance between relevancy of comments and sample size.

When choosing the youtube videos to apply our sentiment analysis workflow on, we first used key words such as support or block to search for the videos supporting or not supporting vaccine mandates. We then inspected the description and watched the video to confirm the relevant view of video. The links to these videos are given below.

Youtube videos relevant towards being for vaccine mandate:

Youtube videos relevant towards being against vaccine mandate:

The sentiment analysis workflow was applied to the videos in order that they appear in the list above. We can see that videos that relate to supporting vaccine mandates have an overall negative sentiment, while videos that are related to being against vaccine mandates have an overall positive sentiment. To get a better idea of the specific words in context, we generated word clouds to see the frequently used words in the comments.

"Majority of workers support Biden's business Covid vaccine mandate: CNBC Survey" is clearly a video about supporting the vaccine mandate. From the wordcloud below, we can see the negativity of the comments. The most notable words are fake, news, liar, lying, propoganda,media etc. In context, these words exhibit how the commenters had a negative view towards vaccine mandates, since they used mostly negative words, such as lying, to express their distrust in the results of the survey probably because they wanted content that is about being against the vaccine mandates.

"‘Shameful’: Doctor on Supreme Court’s Covid-19 vaccine mandate decision" is another video with a view of supporting the vaccine mandate. In this case, a doctor thought that the Supreme Court's decision to block the vaccine mandate was 'shameful' and that there should have been a vaccine mandate. It is important to note that doctors have more medical knowledge than the general public, especially when it comes to vaccinations. Despite being a doctor, the comments suggested that people disliked his viewpoint to by calling his view to mandate vaccines shameful. Interestingly to add onto the negativity towards the doctor, we also prominently see the words government and propoganda, suggesting that comments viewed the doctor as having some political affiliation. To reiterate, doctors carry a lot of knowledge about medicine and how the body works, so this may seem like a more credible source if the doctor supports vaccine mandates, but of course people clearly did not see it that way.

"Why You Can Be Forced To Get The Covid Vaccine" is the last video we obtained pertinent for supporting the vaccine mandate. In this case, we can see that the words used in this case had a negative context. Forced was frequently used and it was generally about desping the idea of being forced to do something. Other notable words are wrong,stop and constitution. In context, people viewed the vaccine mandate as wrong and justified this view by mentioning the constitution, specifically referencing the first amendment.

"Supreme Court Blocks Biden’s Covid Vaccine Mandate For Private Companies" was the first video that we obtained that was about being against vaccine mandates. Recall that the overall sentiment was positive and we can see that there are indeed many positive words. Most notable words are freedom, free, and constitution. Generally, the comments that contained these words viewed the Supreme Court's Decision to block the vaccine mandate as justified due to a contradiction with the first amendment and the idea that America is known for: freedom.

"Supreme Court blocks Biden's vaccine mandate for big businesses" is another video relevant towards being against the vaccine mandate. This video is similar to the one before except it is now about big businesses. Nevertheless, the overall sentiment was positive and the frequently used words for the comments was also similar to the previous video. Again we see many positive words such as thank and good. Additionally freedom, seen on the bottom right corner, was also frequently used again suggesting that the idea of freedom is used as a reason to be against the vaccine mandate.

"Police officer quits over vaccine mandate: 'I decided to turn in my badge so I can speak up'" is the last video that we obtained that had a view of not supporting the vaccine mandate. Interestingly, this was a police officer who quit his job due to the vaccine mandate. The overall all sentiment was very positive. Many of the comments were commending the police officer for his decision as seen by words such as right,hero,braveand proud. Freedom was also frequently used, which shows that the idea of freedom is once again being used against the vaccine mandate.

Overall, we can see that videos supporting vaccine mandates had negative sentiment, while videos that were against mandating vaccines had postive sentiment. Logically, this means that the overall sentiment towards mandating vaccines is negative. This stems from the fact that people commenting on youtube find the idea of mandating something a violation of freedom and the constitution. As a result, any video that attempted to give an opinion or justify the vaccine mandate received more negative comments, whereas videos that reasoned against vaccine mandates contained more positive comments.

Reddit Sentiment on COVID Vaccines

What is Reddit?

Reddit is a social media website with a massive collection of online forums called subreddits which are broken up to cover a different topic. For example, for people who likes tennis, there is a tennis subreddit with almost 770,000 members. On each individual subreddit you can post an image, a video, a gif, a dicussion, or anything that has something to do with that subreddit. People who are affiliated or has some interest with that subreddit can join the subreddit and become one of its members.

Subreddits Used

To answer the question on which if people's view on COVID vaccine relate to their political affliations, various politics subreddits were analyzed based on the comments of their members. We used Reddit API and various Reddit scrapers to collect comments from each subreddit that stemmed from 2 years ago to today as long as they collected a few keys words such as "covid vaccine" or "vaccine". We wanted to generate a large sample to see what various users from that subreddit believed and their stances of COVID vaccines. If someone was apart of the subreddit, they are assumed to be affiliated and have some sort of ideology related to that subreddit. For example, if someone was apart of the Republican subreddit, they are someone who shares or apart of their party.

Collecting all the comments from various subreddits, the comments went through a word process stage where stop words were removed and punctuaction. Then, sentiment analysis was applied on each comment made from that subreddit to be analyzed. We associated negative comments as against COVID Vaccines and positive comments as with the COVID Vaccines. We took an average of each subreddit to see whether if it was negative or positive comments made about covid vaccines.

After each subreddit was analyzed, we made a word cloud of the most important subreddits that our project would focus on to see whether each political affliation had a stance on COVID Vaccines.

We also create a graph to display the averages of comments from each subreddit. The attempt was to see the sentiment of each subreddit. Most of the subreddits on political parties and more towards politics as their views could be scatter from positive or negative. Plotting their average compound scores for each subreddit and this is the result. With liberal being the most negative towards covid vaccines and UC Davis subreddit as the most positive towards covid vaccines. Shocking to see how many liberals, which are supposed to be more open minded and progressive, had more negative comments toward vaccines. Additionally, we can see that Democrats were more neutral on the vaccines compared to the other subreddits.

The Republican Party in the United State is known for their conservative and right wing views where it is less progressive and more traditional with less changes. Many Republicans have slowly become more entrenched with the anti-vaccine activists and are more prone to be against the COVID Vaccine. This is backed by how on average their comments are negative. Additionally, the Republican subreddit are toxic and what they contain, for example graphene oxide was a wide spread rumors that was supposedly in the COVID vaccines and poisonous. Which is very in line with their party, as there are more Republicans who are unvaccinated than Democrats. Hence, why many of the word cloud contains these filler “poison” chemical words as people try to slander the vaccine.

The Democratic Party is the biggest political party in the United States and they are known for their liberal and left wing views where it is more progressive, for change, and hope to seek greater social and economic equality. We believed that since Democrats are more for change for social good, they would be for a COVID Vaccine. We saw that on average that comments were neutral representing not for or against the vaccines. Additionally, this is backed by the word cloud as a lot of the words were discussing vaccines or discussing anything positive for it.

The Liberals are known as left wing which are known to be progressive and more for change. They seek to create social good and incorporate new ideas. When their average compound scores was calculated, it was shockingly the most negative and thus negative against COVID Vaccines.

A Word Cloud was created to determine whether negative comments were associated with a view against COVID Vaccines. Once looking at the word cloud, we can see the most used word was impeach, clorox, impeached, adviser, and hydrochloroquine. The advisor and impeach portion must be focused on the Chief Medical Advisor to the President of United States and how his stance was for social distancing and getting the COVID vaccine as soon as possible. This must be that the Liberals were more against vaccines and believe it to be dangerous. We can see how one of the most used words was also refusing and clorox. Liberals are generally for change and want to see social good which the vaccine could bring as it could help people stay safe against COVID, the data shows otherwise.

After calculating the average compound score for each subreddit, we saw that UC Davis had the highest positive score. Being a college subreddit, the assumption is that college students are more educated and more likely to be for a vaccine that is backed up by science and could potentially save lives. Additionally, college students in general are more liberal compared to being conservative and should align with being for vaccines. [10] To determine whether UC Davis students were against or for the vaccines, we looked through the word cloud to see which words were used.

As it is a college subreddit, it discuss more things related to school such as calculus, midterms, and professors. However, beneficial was used a lot in covid vaccines comments and discussing an online school while waiting for covid vaccines. We can say that UC Davis is for the vaccine which make sense as students are more educated about vaccines and the benefits it could entail.

Overall, we can see that subreddits were divided depending on their affiliation on whether they were for the vaccine or against the vaccine. For example, the Republican subreddit was against the vaccines and talked about how it was poisonous. This is most likely caused by their party beliefs and their views. If the party or political beliefs are more center around being left or right, they can be seen to be against the vaccine or for the vaccine, respectfully. However, this is not always the case as seen with Liberals and how they were the most negative in average compound scores and shown to have some stance against the vaccines. We also have to consider the bias that this is just a small portion of the party as the subreddits do not account for all and every person from that party or political beliefs.

Political Response to COVID Vaccines

Social Background:

When considering how a person responds to the pandemic, whether it be wearing masks in public or getting the COVID vaccine, we were interested in learning about how one's political affiliation may play a role in their willingness to adhere to state mandates. After looking at several subreddits (specific communities on Reddit) on COVID-related posts, there seems to be a difference in sentiment towards the pandemic for Republicans and Democrats. In order to get a comprehensive understanding of the differences in response, we analyzed data throughout the course of the entire pandemic in addition to recent statistics.

Method:

Looking at state data using several APIs, we web scraped the information into dataframes containing relevant information. The CDC provides a extensive number of datasets that can be retrieved from the Open Data Source API. For number of cases specifically, we used the Covid Act Now API to obtain the time series for two of the largest states in the United States with polarized political values, California ($\color{blue}{\text{Dem}}$) and Texas ($\color{red}{\text{Rep}}$).

To evaluate the relationship between vaccination rate and political affiliation, we referred to the CDC's data on vaccine hesitancy for counties in America along with the majority political party identification of each state from Wikipedia's Political party strength in U.S. states page.

Results

Time Series for CA vs. TX

Using a line plot to examine the incidence rate over the course of the pandemic (changes in growth rates at each point in time will be the slope of the curves). Line plots are optimal for displaying time series data, clearly showing changes in growth of case rates each quarter starting from January 2020.

We perform a Student's t test to see if there is a true different in mean incidence rates for the two states. The negative t-statistic, -2.3070487905818493, indicates that California had a lower mean incidence rate over the course of the pandemic. We conclude that the two states did not have the same mean incidence rate because our p-value of 0.02118490789324234 is significant.

Vaccine Hesitancy

The CDC provided information on county hesitancy rates based on Household Pulse Survey responses. A response of "1" corresponds to "definitely getting the vaccine" and a response of "5" means "definitely not getting the vaccine." We take the average hesitancy over counties of each state using panda's groupby.

The COVID Data Tracker from the CDC also establishes the number of doses delivered for each state.

Wikipedia includes several tables highlighting different political strengths for states in the U.S. We specifically chose to look at poltical party identification majorities in each state.

As expected, states with higher hesitancy also had lower doses delivered. There is a strong negative correlation between hesitancy and number of vaccine doses delivered overall. Almost 80% of the variation in number of doses delivered is explained by vaccine hesitancy.

There is a strong positive correlation between political affiliation and mean estimated hesitancy. Republicans generally had higher hesitancy towards getting the COVID vaccine. About 60% of the variation in hesitancy is explained by the state's majority registered political party.

We now consider the incidence rates as of March 5 for all states.

There is a positive correlation between vaccine hesitancy and incidence rates for states in the US. Close to 60% of the variation in incidence rates is explained by mean vaccine hesitancy.

When considering more states, there is still a positive correlation between incidence rates and political party. Around 50% of the variation in incidence rates is explained by political party.

Observations and Comments

Overall, there appears to be an association between political affiliation and response to COVID measures. Texas had a higher mean incidence rates throughout the course of the pandemic (Jan 2020-now) than California. Additionally, political party seems to correlate with incidence rates when considering recent numbers. Political party's association with vaccine hesitancy explains the difference in numbers of vaccine doses delivered in Republican and Democratic states. If the COVID vaccine is indeed effective, the lack of doses delivered also plays a role in the incidence rates of states.

Visualizing association between Asian hate and covid-19

Social background:

On May 8, 2020, United Nations Secretary-General Antonio Guterres said that “the pandemic continues to unleash a tsunami of hate and xenophobia, scapegoating and scare-mongering” and urged governments to “act now to strengthen the immunity of our societies against the virus of hate.” In this part we want to visualize the relationship between covid-19 and discrimination towards Asian.

Methodology:

We quantified the discrimination towards Asians as number of articles that included the key word "anti-Asian" for each month starting from 2018 to 2022, in which 2018's and 2019's data as reference. Then we quantified the impact from Covid-19 using the number of covid-19 caused death and number of covid cases in each month since 202001, when the covid was first reproted in the US.

We used the NYtimes api (https://developer.nytimes.com/apis) to search for the articles that included the word "anti-Asian" and each request will return at most 10 results for a given page. In that case, we wrote a function to continouly reqest for all the pages and stop when the page returned no result. At the same time, this funciton measured the number of results in each page for a give month, e.g. 20190101 to 20190201, and summed up to get the total number of anti-Asian articles of each month.

We obtained the number of death and cases from the NYtime official github. (https://github.com/nytimes/covid-19-data)

Then we combined these to data in plot to visulize their relationship.

Result

number of anti-asian articles vs number of covid-19 cases

Observations and Comments

It is obvious that the number of articles included the workd "anti-Asian" overly increased after the starting of covid-19 pandemic. There is not a clear relationship between the number of covid-19 cases per month and number of anti-asian articles. However, there is always a very significant incrase in the number of anti-Asian articles following a pead of number of covid-19 caused death (202003 and 202011), which indicates an increase in discrimination towards Asian after going through a large number of covid caused death. Based on these features, we can conclude that covid-19 does increase the discrimination towards Asians.

Conclusion

Overall, the sentiment on vaccine mandates on youtube was negative due to the conflicts with the constitution and freedom. It is important to note, however, that we only included videos that had the most views relevant towards vaccine mandates; it may be the case that the overall sentiment is different when including more videos, but we were unable to do so due time and length of the report.

The subreddits that were chosen and analyzed were based on their politics or political party. We were able to determine that Reddit was divided depending on their affiliation and had some prior beliefs relating to vaccines. Left wings were more for the vaccine, except for liberals, and right wings were against the vaccine. However, we should spend more time checking more comments thoroughly and subreddits to have a bigger sample size to determine whether or not the general consensus was for or against vaccines.

Additionally, it is also worth noting that social media platforms are not a reliable source, so the overall sentiment of a particular platform should not be extrapolated to conclude the general population’s view.

When considering one’s background for their response to COVID measures, we discovered that there was a correlation between political affiliation and the willingness to get vaccinated. In a narrow scope of highlighting only the biggest states for each party (California vs. Texas), we found that the mean incidence rates over the course of the pandemic were different; Texas had a significantly higher mean rate. After considering more states on their response to COVID, there were noticeable associations between majority political identification of states and mean vaccine hesitancy of their counties. Additionally, there was a positive correlation between incidence rates and hesitancy; thus, we can conclude that higher COVID vaccination rates are correlated to lower number of cases. Looking at political affiliation and mean hesitancy response will reveal which states (specifically which of their counties) need the most enforcement for the COVID vaccine in order to lower their incidence rates. For some future considerations, we could plot vaccination rate over time for each state as each state has some major political party raising for more comparison and discussion.

According to our visualization for question 4, there is an increasing discrimination towards Asian Coummnity during Covid-19. Furthermore, an increase in Covid-19-related deaths correlates with an increase in the number of articles related to anti-Asian reports, which we use to quantify the discrimiation towards Asians. Thus, related policy should be made to combat anti-Asian bias and violence and extra funding should be assigned to better protect Asian community during the Covid-19 pandemic. For some future considerations, we could conduct similar plots for different countries to better understand how covid-19 affect discrimination against Asian in dfferent countries.

Source Code

Youtube code

Text preprocessing functions

Reddit code

Vaccine Response Code

Asian Hate code

References

  1. How to Extract YouTube Comments Using Youtube API – Python, https://www.geeksforgeeks.org/how-to-extract-youtube-comments-using-youtube-api-python/

  2. Python Lambda, https://www.w3schools.com/python/python_lambda.asp

  3. Youtube API, https://developers.google.com/youtube/v3/docs/commentThreads

  4. Vader sentiment Analyzer, Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
  5. Reddit API, https://www.reddit.com/dev/api/
  6. Reddit Scraper #1, https://github.com/pushshift/api
  7. Reddit Scraper #2, https://praw.readthedocs.io/en/stable/
  8. Are Students Liberal? Yes - But Not Everywhere, https://www.realcleareducation.com/articles/2020/11/25/are_students_liberal_yes__but_not_everywhere_110512.html
  9. CDC API, https://covidactnow.org/data-api
  10. United States COVID-19 Cases and Deaths by State over Time, https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36
  11. COVID Data Tracker, https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total.
  12. Nytimes api, https://developer.nytimes.com/apis
  13. Nytimes Covid-19 github, https://github.com/nytimes/covid-19-data